Speech recognition using non-linear trajectories in a formant-based articulatory layer of a multiple-level segmental HMM
نویسندگان
چکیده
This paper describes how non-linear formant trajectories, based on ‘trajectory HMM’ proposed by Tokuda et al., can be exploited under the framework of multiple-level segmental HMMs. In the resultant model, named a non-linear/linear multiple-level segmental HMM, speech dynamics are modeled as non-linear smooth trajectories in the formant-based intermediate layer. These formant trajectories are mapped into the acoustic layer using a set of one or more linear mappings. The N -best rescoring paradigm is employed to evaluate the performance of the non-linear formant trajectories. The rescoring results on TIMIT corpus show that the introduction of nonlinear formant trajectories results in improvement on recognition phone accuracy compared with linear trajectories.
منابع مشابه
Towards an improved model of dynamics for speech recognition and synthesis
This thesis describes the research on the use of non-linear formant trajectories to model speech dynamics under the framework of a multiple-level segmental hidden Markov model (MSHMM). The particular type of intermediate-layer model investigated in this study is based on the 12-dimensional parallel formant synthesiser (PFS) control parameters, which can be directly used to synthesise speech wit...
متن کاملModels of Speech Dynamics in a Seg Using Intermediate Linear
A theoretical and experimental analysis of a simple multilevel segmental HMM is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate (articulatory) layer, where speech dynamics are modeled using linear trajectories. Three formant-based parameterizations and measured articulatory positions are considered ...
متن کاملThe effect of an intermediate articulatory layer on the performance of a segmental HMM
We present a novel multi-level HMM in which an intermediate ‘articulatory’ representation is included between the state and surface-acoustic levels. A potential difficulty with such a model is that advantages gained by the introduction of an articulatory layer might be compromised by limitations due to an insufficiently rich articulatory representation, or by compromises made for mathematical o...
متن کاملImproved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis
This paper describes the use of non-linear formant trajectories to model speech dynamics. The performance of the non-linear formant dynamics model is evaluated using HMM-based speech synthesis experiments, in which the 12 dimensional parallel formant synthesiser control parameters and their time derivatives are used as the feature vectors in the HMM. Two types of formant synthesiser control par...
متن کاملModelling Speech Signals using Formant Frequencies as an Intermediate Representation
This paper concerns Multiple-level Segmental HiddenMarkov Models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or t...
متن کامل